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Many complex networks show signs of modular structure, uncovered by community detection. Although 
many methods succeed in revealing various partitions, it remains difficult to detect at what scale some 
partition is significant. This problem shows foremost in multi-resolution methods. We here introduce an 
efficient method for scanning for resolutions in one such method. Additionally, we introduce the notion of 
"significance" of a partition, based on subgraph probabilities. Significance is independent of the exact 
method used, so could also be applied in other methods, and can be interpreted as the gain in encoding a 
graph by making use of a partition. Using significance, we can determine "good" resolution parameters, 
which we demonstrate on benchmark networks. Moreover, optimizing significance itself also shows 
excellent performance. We demonstrate our method on voting data from the European Parliament. Our 
analysis suggests the European Parliament has become increasingly ideologically divided and that 
nationality plays no role. 



Networks appear naturally in many fields of science, and are often inherently complex structures. By 
looking at the modular structure of a network we can reduce its complexity to some extent, yielding a 
"bird's-eye view" of the network 13 . 
Although there is no universally accepted definition of a community, there are some commonly accepted 
principles. We denote by G = ( V,E) a graph with nodes V and edges E <= V x V, where the graph has n = | V| 
number of nodes and m = \E\ number of edges, and is said to have a density of p = mj ) ■ The idea is that in 
general, we want to reward links within communities with some weight a y , while we want to punish missing links 
within communities with some weight b^. Working out this idea we arrive at 



H(a)=- [aijAij-bt^l-Ai^OiO,), 



(1) 



for the "cost" of a partition a. Here A y is the adjacency matrix, which is A y = 1 if there is a link between i and j and 
zero otherwise, a t denotes the community of node i, and S((jj, aj) = 1 if and only if a t = G; and zero otherwise. This 
is a slightly more simplified version of the approach by Reichardt and Bornholdt 4 . We will restrict ourselves here 
to simple, unweighed graphs. 

Different weights a y and £> y give rise to different methods. One can imagine for example taking the number of 
common neighbours as weight by, the distance of the shortest path or some transition probability in a random 
walk. Many methods have been developed over the years, but the most noteworthy method is that of modularity 5 
which uses a y - = 1 — p y , fc y - = p y where pjj is some random null-model. It has risen to prominence because it 
showed encouraging results in various fields, ranging from ecology 6,7 and biology 8-9 to political science 10 and 
sociology 11 . 

Nonetheless modularity was found to be seriously flawed. Its biggest problem is the resolution limit 1213 , which 
states that modularity is unable to detect relatively small communities in large networks. We showed previously 
that methods that use local weights (i.e. a y and fo y are independent of the graph) do not suffer from the resolution 
limit 14 , and are hence called resolution limit free. Within this framework there are relatively few methods that are 
resolution limit free. One such method is the Constant Potts Model 14 (CPM). This model has as weights a y = 1 — 
y and b,j = y where y is a so-called resolution parameter (see next paragraph), resulting in 



H(a,y)=-^2[A ij -y]8(a i ,o j ) 



(2) 
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Rewriting this in terms of communities, we arrive at 



w(ff,y)=-X)k-ynj]. 



(3) 



<XH{ yi ,a) + {\-X)H{y 2 ,a)=n{y,a). 



where e c is the number of edges within community c (or twice for 
undirected graphs — this is due to double counting in £y(Ay — y)c>(ff„ 
Gj) for undirected graphs) and n c is the number of nodes within 
community c. It can be seen as a variant of the Reichardt and 
Bornholdt Potts model when choosing an Erdos-Renyi (ER) null 
model, which assumes that each edge has the same independent 
probability of being included p. In the remainder of this article, when 
speaking of a random graph, we refer to an ER random graph, unless 
explicitly stated otherwise. 

It is not too difficult to show that any (local) minimum yields a 
nice interpretation of the role of the resolution parameter y. 
Succinctly stated, communities have an internal density of at least 
y and an external density of at most y. The parameter y can thus be 
seen as the desired density of the communities. The central question 
in this paper is why and how we should choose some resolution 
parameter y. 

Results 

Although CPM does not suffer from the resolution limit, there do 
remain some problems of scale 15 . In particular, there is no a-priori 
way to choose a particular resolution parameter y. We address this 
issue in this paper from two complementary perspectives. First we 
will detail how to efficiently scan different resolution parameters y for 
CPM. Secondly, we introduce the notion of "significance" of a par- 
tition (which is independent of any method). Both perspectives help 
in choosing some particular resolution parameter y. We will dem- 
onstrate the method on benchmark networks, and show that both 
scanning for the right resolution parameter as well as optimizing 
significance itself shows excellent performance. As an application 
of our method, we analyse a network based on votes of the Euro- 
pean Parliament (EP). 

Scanning resolutions. Often, various measures of stability — how 
much does the partition change after some perturbation — are used 
to determine whether a resolution parameter or a partition is 
"good" 16 " 19 . In this section we look at stable "plateaus": ranges of y 
where the same partition is optimal. If a partition is optimal over the 
range of [y u y 2 ] then the communities have a density of at least y 2 and 
are separated by a density of at most y t . Hence, the larger this stable 
"plateau", the more clear-cut the community structure. 

For y = 0, the trivial partition of all nodes in a single community is 
optimal (since in that case any cut will increase the cost function). On 
the other hand, for y = 1 the optimal partition is to have each node in 
its own community. This idea holds in general: a higher y gives rise to 
smaller communities. 

The intuitive idea that a partition should remain optimal for some 
(continuous) interval of y can be formalized. More precisely, if a is an 
optimal solution for y\ and y 2 , then a is also an optimal solution for 
all yefyj ,y 2 ] (which was also remarked in the supporting information 
of ref. 10 for a similar method). 

Theorem 1. LetTL(y,a) be as in equation (2). If o* is optimal for 
both y\ and y 2 , or 

a* = arg max (T 7i()' 1 ,(T) = arg max a H(y 2 ,a) 

then a* = arg max ff 7i (y,a) for y 1 s y < y 2 . 

Proof. First observe that "H{y,a) is linear in y, which can be easily 
seen from the definition. Suppose that a* is optimal in yi and y 2 . Let y 
= ).)>i + (1 — A)y 2 with0 <J,< 1, then by linearity of TL{y,a) iny and 
optimality of a* we have 



Hence "H(y,(T*) <TC(y,&) and a* is optimal for ye[y!,y 2 ]. 
As stated, TL(y,o) is linear in y, and we can rewrite it slightly to 
emphasize its linearity 

n(y,a) = -J2[A,,-y]5{o-„ aj ) 

■j (4) 
= -[E-yN] 

where E : — ^ c e c the total of internal edges and N := ttf. is the 
sum of the squared community sizes. 

It is less obvious how to detect whether a partition remains optimal 
over some interval. Fortunately, it turns out that N is monotonically 
decreasing with y. Specifically, if both partitions are optimal for both 
resolution parameters, then necessarily = N 2 , and so also Ei = E 2 . 
We therefore only need to find those points at which JV(y) changes, 
which can be done efficiently using bisectioning on y. 

Theorem 2. Let a z — arg max a TC(y z ,a), z =1,2. Furthermore, let 
N z = J^ c "c( ff z) wner e n c(°~z) denote the community sizes of the 
partition a z . Lfy 1 < y 2 then N t > N 2 . 

Proof. The two partitions a Y and a 2 have the costs 
H(y l ,<Ti) = — Ei +y l Ni, H{y 2 ,a 2 ) = — E 2 +y 2 N 2 . Both partitions 
are optimal for the corresponding resolution parameters and we 
obtain 

-E 1 +y 1 N 1 <-E 2 + y 1 N 2 , 
-E 2 +y 2 N 2 < -£i+y 2 Ni. 

Summing both inequalities yields 

- {E 1 +E 2 ) +y l N 1 +y 2 N 2 < - fa +E 2 ) +y 1 N 2 +y 2 N 1 

and so yi(A/i - N 2 ) £ y 2 (Ni - N 2 ). Since y x < y 2 we obtain that Ni s 
N 2 . 

Significance. Another, complementary, point of view would be to 
have some quality measure to state at what resolution y the partition 
is "good". After some reflection, it is ironic we return to the question 
of what resolution yields a good partition. After all, the initial goal of 
modularity was in fact to decide on some resolution level: where to 
cut a particular dendrogram 5 . 

Although modularity compares the number of edges within a 
community to a random graph, this does not provide any "signifi- 
cance" of a partition, since random graphs and sparse graphs without 
community structure can also have quite high modularity 20 22 . Other 
approaches have been suggested that try to estimate in some way the 
significance of a partition. One recent approach, known as "sur- 
prise", focuses on the probability to find E internal edges in a random 
graph 23,24 . Another more "local" approach keeps the degrees constant 
and asks what the probability is to connect so many edges to a given 
community 25 , which led to a method known as OSLOM 26 . A third 
approach focuses on the likelihood of generating a graph given a 
certain partition and degree distribution 27 , known as stochastic block 
models. 

But when thinking about the significance of a partition, most 
methods go about it the wrong way around 23 27 . We do not want to 
know the probability a "fixed" partition contains at least E internal 
edges, but whether a partition with at least E internal edges can be 
found in a random graph, which is the approach we will take in this 
paper. After all, community detection involves searching for some 
good partition, so we should focus on the probability of finding such a 
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a) 




b) c) 




Figure 1 | Probabilities for partitions. Consider the example partition 
provided in (a). The objective is to somehow estimate how (un)likely such 
a partition occurs in a random graph-the significance of a partition. In (b) 
and (c) we show the same graph, but in (b) the same partition as in (a) is 
used, while in (c) a partition with more internal edges is used. For 
illustrative purposes, the graph is generated by randomly rewiring some of 
the edges and permuting the nodes of the original graph in (a). Earlier 
approaches keep the partition fixed, and focus on the probability that so 
many edges fall within the given partition, as illustrated in (b). Yet this 
ignores there might exist some partition within this graph that has more 
internal edges. Therefore, we focus on the probability of finding such a 
dense partition in random graphs, as illustrated in (c). 

good partition in a random graph. In a way, the earlier approaches 
assume the partition is "fixed" and the edges are randomly distrib- 
uted, whereas we try to find a partition in a random graph, which can 
result in quite different statistics. Stated somewhat differently, earlier 
approaches ignore that a simple permutation of nodes still contains 
the same partition — one only needs to identify the permutation to 
uncover the original partition — whereas our approach does account 
for that. We illustrate the differences in the two approaches in 
Figure 1. 

Nonetheless, these earlier approaches might work quite well. For 
example, explicitly calculating the probability to find E internal 
edges, seems to yield good results 23,24 . Obviously, the two probabil- 
ities — surprise and our approach — are not completely independent. 
If the probability of finding many edges within a partition is high 
then surely finding a partition with many edges should be easy. On 
the other hand, if the probability of finding a dense partition is low, 
then surely the probability a partition contains many edges is low as 
well. In between these two extremes is a grey area, and a more in- 
depth analysis is required for understanding it exactly. 

Although exact results for finding a partition in a random graph 
are hard to obtain, we do get some interesting asymptotic results. The 
asymptotic limit we analyse concerns the probability to find a par- 
tition into a fixed number of communities with a certain density for 
n — » oo in a random graph. The probability for finding a certain 
partition can be reduced to finding some dense subgraphs in a ran- 
dom graph. We consider subgraphs of size proportional to n, so that 
it is of size sn, with 0 < s < 1 of a fixed density q. Our central result 
concerning these subgraph probabilities is the following (the proof 
can be found in the Methods section). We here use the asymptotic 
notation/ = @{g) for denoting g is an asymptotic upper and lower 
bound for/. 



Theorem 3. The probability that a subgraph of size n c and density q 
appears in a random graph of size n and density p is asymptotically 

Vr{S{n c ,q)^g(n,p))=e®(-tt) D ^ ) ) (5) 

where D(q \ \ p) is the Kullback-Leibler divergence 2 " 

D(q\\p) = qlogl+(l-q)lo g ^ |. (6) 
p 1-p 

For each p q the probability decays as a Gaussian, with a rate 
depending on the "distance" between p and q as expressed by the 
Kullback-Leibler divergence. Furthermore, the larger the subgraph 
the less likely a subgraph of different density than p can be found. 
Combining these probabilities we arrive at the following approxi- 
mation for the probability for a partition to be contained in a random 
graph 

PrW=nexp(-(2 C )D(p c ||p)) (7) 

where p c is the density of community c. We define the significance 
then as 

5(a) = - log Pr( ff ) = £(^)D(p c ||p). (8) 

c 

Notice that for the two trivial partitions of (1) all nodes in a single 
community (y = 0) or (2) each node in its own community (y = 1), 
the significance is zero (assuming no self-loops). Since the signifi- 
cance is non-negative (because the Kullback-Leibler divergence is 
non-negative), there will most likely be some partition in between 
these two extremes (0 < y < 1) which yields a non-zero significance. 

Encoding gain. Notice that the Kullback-Leibler divergence can be 
interpreted as a kind of entropy difference. It can be written as 

D(q\\p)=H(q,p)-H(q) (9) 
where H(q) is the binary entropy and H(q, p) is the cross entropy 
H(q)=-q log q-(l-q) log(l-q), (10) 

H(q,p)=-q logp-(l-g) log(l-p). (11) 

Hence, it measures the difference in entropy between p and q, 
assuming that q is the "correct" probability. 

This points to a possible interpretation of the significance S(a) in 
terms of encoding of the graph. Suppose we are requested to com- 
press the graph G, and we do so using the simplest possible frame- 
work: for each possible edge we indicate whether it is present or not. 
Using the average graph density p, by Shannon's source coding the- 
orem 28 , the optimal code lengths are — logp for indicating an edge is 
present and — log(l — p) for indicating an edge is absent. Now 
suppose that for some community we have the actual density q. 
The expected code length using the average graph density is then 
H(q, p). If we use the actual graph density q however, we obtain an 
expected code length oiH(q). The gain in coding efficiency by using q 

instead ofp is then D(q \ \p). Doing so for all ^ possible edges, and 

for all communities then yields the significance (we hence don't 
count the external edges). Significance can thus be regarded as the 
gain in encoding a graph by making use of a partition. 

Using significance. There are two ways to use significance. Firstly, we 
could use significance to select a particular resolution parameter y. As 
was made clear in the previous subsection, we don't have to scan 
ye[y 1 ,y 2 ] jfN(y{) = N(y 2 ). If in addition we are only interested in they 
for which S(a) is maximal, we can only scan those ranges for which 
the significance is maximal (taking a greedy approach), similar to 
root-finding bisectioning. 
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At = 0.1 /j, = 0.5 




7rb 7rb 

Figure 2 | Scanning results for directed and hierarchical benchmark graphs. We display the squared community sizes N = VJ C n 2 c , the total internal 

edges E = ^ c e c and the significance S = ^ c ( " c j D(p c \\p) of each partition (all on a logarithmic axis on the left). The VI (on a linear axis on the right) is 

calculated over the various results returned by running a stochastic algorithm. If the VI is low, this indicates that the partitions found by the algorithm are 
(almost) the same. The black dashed line indicates the expected (maximal) significance of an equivalent random graph, which is estimated to be 
about n log n ~ 6,908. 



Secondly, we could optimise significance itself. We use an 
approach similar to the Louvain method 29 for optimizing signifi- 
cance (see Methods). Notice that using significance as an objective 
function is not resolution limit free, contrary to CPM 14 . After all, 
given a partition and a graph, pick a subgraph that consists of only a 
single community. Then the significance S(a) of that partition, 
defined on the subgraph equals 0, since D(p c \ \ p) = 0. Since this 
constitutes the minimum, it is unlikely that no other partition pro- 
vides a higher significance. Hence, the same partition no longer 
(necessarily) remains optimal on all community induced subgraphs, 
and the method is hence not resolution limit free. 

Resolution profile. Scanning the resolution parameters using bisec- 
tioning seems to work quite well on LFR benchmark networks 30 , as 
displayed in Figure 2. These benchmark networks haven = 10 3 nodes 
and have an average degree (k) = 20 with a maximum degree of A = 
50, and follow a power-law distribution k Tk with = 2. The 
community sizes range between 20 and 100, and are distributed 
according to n]' with t c = 1. This corresponds to the settings as 
used for comparing several algorithms 31 . The proportion of 
internal links can be controlled by a so-called mixing parameter 0 
£ \i £ 1, so that for [i = 0 communities are easily detectable, whereas 
this becomes increasingly difficult for higher f.i. For the hierarchical 
benchmark the mixing parameters /.ij controls the coarser level and 
fi 2 controls the finer level. For more details, we refer to Lancichinetti, 
Fortunato & Radicchi 30 . 

From Figure 2 it is quite clear that both N and E are stepwise 
decreasing functions of y. The plateaus indeed correspond to the 
planted partition for the benchmark network. The "stability" of a 
partition is reported in terms of the average pairwise variation of 



information (VI) between the various results of multiple runs of 
the algorithm. The VI measure can be interpreted as a distance 
between partitions 32 , so a low value indicates the results are relatively 
stable. Indeed, in the range of the plateau, the VI is relatively low 
(near 0), indicating the partition is relatively stable. Hence, using 
such heuristics, it seems possible to scan for "stable" plateaus of 
resolution values. Moreover, significance is highest in the region of 
the plateaus, and thus seems to be able to point to "meaningful" 
resolutions for these networks. 

For hierarchical LFR benchmark graphs 30 results are similar 
(Figure 2). This network has n = 10 3 nodes, and each node has a 
degree of k t = k = 20. It consists of 10 large communities of 100 
nodes each, and each large community is composed of 5 smaller 
communities of 20 nodes each. We observe two plateaus for fi 2 = 
0.1 (we have used fii = 0. 1 for both results), corresponding to the two 
levels of the hierarchy. For these plateaus the VI is near zero, indi- 
cating quite stable results. For fi 2 = 0.5 the two plateaus have merged 
into a single plateau, The smaller communities are more significant 
for fi 2 = 0.1. This makes sense, since the smaller communities are 
quite well defined for this regime, while the larger communities are 
less clearly defined. Interestingly, when the two plateaus merge for ji 2 
= 0.5, the significance is lower than for ji 2 = 0.1. Indeed, the com- 
munities are less clearly defined for /( 2 = 0.5 than for ^i 2 = 0. 1 . Again, 
this makes sense, as the smaller communities are much less clearly 
defined, while most links still fall within the larger community (since 
^ = 0.1). 

ER graphs. Applying the same technique as in the previous 
subsection to ER graphs, we obtain a resolution profile, which 
shows a particular transition (Figure 3a). This transition can be 
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explained by the asymptotics of significance. As the graph grows, and 
n —¥ oo, the probability in equation (7) Pr(ff) — > 0 for p c p. This 
indicates that it becomes increasingly difficult to find (relatively 
large) subgraphs of a density different from p, and in the limit we 
expect only to find subgraphs of about density p. For y < p we then 
expect to find one large community, while for y > p we expect to 
obtain each node in its own community, thereby explaining the 
transition around y* ~ p. The asymptotic analysis ignores the fact 
that the number of communities may grow with the number of 
nodes. Therefore, it misses the fact that small communities may 
have a density of p c > p, which explains the somewhat slower 
increase of number of communities for y > p. 

Analysing how significance behaves in ER graphs provides us with 
a baseline to compare to observed significance values. Obviously, the 
maximum significance scales with the size of the graph. In particular, 
it seems to scale as n log n (Figure 3b). Compared to the benchmark 
graphs (Figure 2), the significance found in random graphs is rather 
low, so that significance shows little to no sign of any community 
structure in ER graphs (although there will be a non-trivial partition 
obtaining this maximum significance). By comparing the observed 
significance in any graph to n log n, one is thus able to asses to what 
extent the observed community structure is significant. We believe 
this represents a first step towards a fully fledged hypothesis testing of 
the significance of community structure. 

Maximizing significance benchmarks. We have tested the two 
methods: (1) using significance to choose a y in CPM; and (2) 
optimizing significance itself. We used the standard LFR bench- 
mark, with the same parameters as for Figure 2 for the "big" 
communities, while the "small" communities range from 10 to 50, 
for both n = 1,000 and n = 5,000. The results are displayed in 
Figure 4. We measure the performance using the normalized 
mutual information (NMI) 31 , with NMI = 1 indicating the method 
uncovered the planted partition exactly. It is clear that using 
significance to scan for the best y parameter for CPM works quite 
well. Surprisingly however, optimizing significance itself results in a 
slightly worse performance than scanning for the optimal y 
parameter for CPM for some settings. This is presumably due to 
some local minima in which the significance optimization gets 
stuck, while this is not the case for CPM. Nonetheless, optimizing 
significance works quite well, and seems to outperform Infomap 33,34 , 
which was previously shown to perform well 31 . The OSLOM method 
performs relatively well, although not as well as using significance to 
scan for the best y parameter for CPM. This method is aimed at 
overlapping communities, so an adjusted NMI 35 was used to 
account for that, which still equals 1 if it uncovers the planted 



partition exactly. No results for the significance are provided, since 
there is no adjusted version of this measure (yet). Modularity clearly 
shows signs of the resolution limit 12 , as it has difficulties detecting 
smaller communities in relatively large networks. In general, all 
methods have a similar computational complexity and use (vari- 
ants of) the Louvain method 29 . Detecting the optimal resolution 
value y for CPM involves running the Louvain method multiple 
times which obviously takes more time. 

Calculating the significance for the planted partition 5*, we see 
that in general whenever a method correctly finds the planted com- 
munities (i.e. NMI =1), that the significance of the partition found 
by that algorithm is equivalent, so that S = S* (second row of 
Figure 4). We observe a decrease in significance for increasing /.(, 
as expected (third row of Figure 4). At the point where the signifi- 
cance of the planted partition goes below the significance of an equi- 
valent random graph, S* < (S) , no method seems able to correctly 
detect the communities. This suggests that significance accurately 
captures whether there is some partition present in the network or 
not. Before this point, whenever a method is unable to detect the 
planted communities, the significance of that "incorrect" partition is 
lower than that of the planted partition, S<S*, indicating that the 
planted partition is of maximal significance. 

European parliament. We demonstrate the method on networks of 
the European Parliament (EP) from 1979-2009, where each vote of a 
member of parliament (MEP) for or against a certain proposal is 
recorded, the so-called roll call votes (these do not constitute all 
votes in the EP though), similar to an analysis of the U.S. Senate 10 . 
Over this whole period, a total of almost 16 million votes were cast, by 
in total a little over 2,500 different MEPs for more than 21,000 issues. 
For each parliamentary year (roughly from mid- June to mid- June the 
next year), we constructed a network, where there is a link between 
two MEPs whenever they vote more in accord than average. We only 
take into account votes whenever both MEPs cast a yea or nay vote 
(instead of abstaining, not voting or being absent). We used data 
from Simon Hix 36 . 

The MEPs are elected for a five year period from national member 
states, and each MEP is associated to a national party. In total we can 
discern 169 national parties over the whole period, but usually parties 
and MEPs organise themselves in political groups (EP groups) that 
correspond to some ideological views, ranging from liberalism to 
socialism and from conservatives to progressives. Not all MEPs orga- 
nise themselves in EP groups; these are known as Non-Attached 
(NA) members. Although the EP has the power to choose the 
European Commission (not per individual commissioner, but as a 
whole), they do not need to organise themselves in governing parties 



a) 




7CPM 




10,000 



Figure 3 | Results for ER graphs. In (a) we show that there is a transition around y = p the density of the graph. This transition can be explained by the 
subgraph probabilities calculated in this paper, which suggest that asymptotically, a random graph only contains subgraphs of about the same density (of 
size proportional to n). In (b) we show the significance of random graphs, which seems to scale approximately with n log n. 
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Figure 4 | Benchmark results for significance. Finding the optimal resolution value for CPM using significance seems to work best (first row), where an 
NMI of 1 indicates the algorithm uncovers exactly the planted partition (OSLOM might return overlapping communities, we used an adjusted NMI for 
that 35 , but it is still 1 if it is correct). Optimizing significance itself also works rather well. We tested two different community size distributions: the small 
communities are between 10 and 50 nodes, and the big communities between 20 and 100 nodes. The resolution limit is clearly visible for modularity, 
which shows especially for small groups in large networks. Whenever the significance found by each method S is higher than the significance of the 
planted partition S", the planted partition is no longer the optimal partition from the significance point of view (second row). If the significance of the 
planted partition S* is lower than the significance of an equivalent random graph (S) no method seems able to correctly detect the planted partition (third 
row). 



and opposition. Nonetheless, various coalitions are formed, and 
from time to time the largest groups have collaborated in a grand 
coalition of sorts 37 . In short, we can create a partition on three dif- 
ferent aspects of the MEPs: (1) their EP group; (2) their national 
party; and (3) their member state. In addition, we obtain the partition 
that maximizes significance. 

We show the normalized significance (i.e. normalized by (S) ~ 
n log n) for the four different possible partitions in Figure 5 from 
1979 (the first EP) to 2008 (the sixth EP). Given the (sub) national 
constituencies of elected MEPs, one particular concern is that the EP 
is governed by national interests, rather than some common 
European interest. Our results clearly show that neither a partition 
based on national party nor on a partition based on member states is 
significant. To be clear, this does not imply that MEPs of the same 
national party do not vote similarly (because they do), rather, it 
means they vote highly similar to MEPs of other parties. For member 
states however, the division seems to run across member states, and 
MEPs of the same member state do not necessarily vote in a similar 
fashion. This shows that in general MEPs do not vote along national 
lines, although for certain votes the national background may play a 
role 38 - 39 . 

The partition in EP groups shows 5 to 15 times the significance of a 
random graph, making it quite significant. Whereas the partition 
into member states and national parties remains almost constant 
throughout time, the partition into EP groups increases quite a lot 

S 

from 1979 to 2008, with an all time low of y— =3.9 in 1981 and 

(<->) 

reaching its maximum of — — = 16.8 in 2001, an increase of more 

(S) 

than 400%. One possible explanation of the general increase in divi- 
siveness is that the EP has become more powerful over the years, so 
that competition over important issues have taken a lead 37,38,40 . 
Besides a general trend upwards, there seems to be a particularly 
large jump between 1995 and 1996. One possible explanation is that 
Austria, Finland and Sweden entered the European Union in 1995, 
whereafter MEPs were elected to parliament in 1995 and 1996. On 
the other hand, the accession of Eastern European countries in 2004 
and Eastern Balkan countries in 2007 did not seem to increase the 
divisiveness. The maximum significance closely follows the same 
trend as the EP group partition, suggesting the two are related. 



We have also analysed in the sixth parliament for the year 2008, 
using CPM and significance, to see what scales of community struc- 
ture are present. We show results for y = 0.5 and y = 0.8, with the 
latter corresponding to the maximal significance for CPM. Clearly, 
the communities have a quite high internal density, and are quite 
strongly connected amongst each other, as is also clear from the 
adjacency matrices displayed in Figure 5. 

At y = 0.5 CPM groups together the Greens/European Free 
Alliance (G/EFA) and the European United Left/Nordic Green Left 
(EUL/NGL), which are both left wing environmental parties. The 
Party of European Socialists (PES), joins the two other leftist parties 
at a somewhat higher resolution of y = 0.8. The more conservative 
parties ofthe Union for Europe of the Nations (UEN) and the Alliance 
of Liberals and Democrats for Europe (ALDE) seem to join forces 
with the more centric European People's Party-European Democrats 
(EPP-ED). The eurosceptic Lndependence/Democrats (IND/DEM) 
group divides itself between the right-wing and the left-wing bloc, 
although some members constitute a separate bloc with other Non- 
Attached (NA) MEPs, who themselves also split across the two large 
blocs. The partition maximizing significance is different still, but 
shows a similar grouping of EP groups, in addition to several smaller 
communities. Surprisingly however, a part of UEN is joined with 
PES, although they seem ideologically more remote. 

These three different partitions highlight different aspects of the 
voting network. The partition maximizing significance for CPM (at y 
= 0.8) seems to highlight a more or less traditional partition into left 
and right wing politics 37 . The partition for y = 0.5 seems to reveal a 
grand coalition 37 , with mainly the green-left differing from the rest. 
The partitions maximizing significance itself seems to highlight some 
interesting split of the UEN. In conclusion, the EP shows signs of 
multiple possible partitions, and significance seems to point to some 
interesting partitions. 

Discussion 

We have presented in this paper a method to find significant scales in 
community structure. Firstly, we introduced a bisectioning method 
allowing a fast and accurate construction of a resolution profile. 
Secondly, we suggested a measure based on subgraph probabilities 
in order to state what partitions are significant. This measure can be 
interpreted as the gain in encoding a graph by making use of a 
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Figure 5 | Results for the European Parliament (EP). In (a) we show the significance of four different possible partitions throughout time: the partition 
that maximizes significance and partitions based on the affiliation of each member of parliament (MEP) to an EP group, a national party or a member 
state. In (b) we show the resolution profile for the sixth EP in the parliamentary year 2008 (June 13, 2008-June 12, 2009). Besides the other quantities, we 
also show the similarity as measured by the NMI to partitions based on the EP groups, the national parties and the member states. In (c), (d) and (e) we 
show how such a partition looks like at maximum significance and at two different resolution values y = 0.5 and y = 0.8 respectively. The latter 
corresponds to the partition that maximizes significance. The top shows the division in parliament, while the bottom shows the adjacency matrix ordered 
the same as the parliament. The division in communities is indicated by the grouping of the seats in parliament and the black lines in the adjacency matrix, 
while the EP groups are indicated by colour. For a key to the abbreviations of the EP parties, we refer to the main text. 



partition. We showed significance is able to accurately portray parti- 
tions in benchmarks. Additionally, we showed on an empirical 
example using voting data of the European Parliament that this mea- 
sure conveys meaningful information in that setting. Significance 
seems to be closely related to the measure of surprise 23,24 and to 
stochastic block models 27 , relationships we hope to explore further 
in the future. 

We conjectured that the maximum significance (S) ~ n log n for 
random graphs, which allows researchers to compare the observed 
significance to the expected significance. It constitutes a first step 
towards fully fledged hypothesis testing of the significance of parti- 
tions. Nonetheless, a proof of this behaviour is lacking so far. 
Moreover, the standard error needs to be estimated still, although 
simulations show it is relatively small. Furthermore, the significance 
is currently based on Erdos-Renyi graphs, but it might be more 
realistic to take the degree distribution into account 3 . Significance 
is not only useful for partitions found using community detection, 
but also for partitions based on other node characteristics 41 , such as 
school grades 42 , gender 43 , or dormitories 44 , similar to what we did for 



the European Parliament, and as such we deem it to be a valuable 
contribution to analysing partitions in complex networks. 

Methods 

Subgraph probabilities. We write GgQ{ n,p) for a random graph G from Q ( n ,p) , such 
that each edge has independent probability p of being included in the graph, the usual 
Erdos-Renyi (ER) graphs. We use |G|: = V~(G) — n for the number of nodes and 
|G||: — \E(G)\ — m for the number of edges. We use H C G to denote the fact that His 
an induced subgraph of G. We write Pr(H l ~G(n,pj) for the probability that H is an 
induced subgraph of a Get?(tt,p). Let S(n c , m c ) = {G\ \G\ = tt c ,||G|| = m c } denote the 
set of all graphs with n c = \ G\ vertices and m c = ||G|| edges. Furthermore, we slightly 
abuse notation and write Pr(5(n L ,m c ) ^Q(n,p)) for the probability that a graph 
GeQ(n,p) contains one of the graphs in S(n c , m c ), i.e. 



V r {S(n c ,m c )sg(n,p))-- 



Pr( [J HE 

HES(n c m c ) 



G(n,p)). 



Let us denote by X the random variable that represents the number of occurrences 
of a subgraph with n c vertices and m c edges in a random graph. Let X H be the indicator 
value that specifies whether a subgraph H of order n c — \H\ in the random graph 
equals one of the graphs in S(n c , m c ), which of course comes down to 
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X H = | 



1 if ||H|| = m c and \H\ - 
0 otherwise 



We can then write X — ^ H Xh where the sum runs over all J possible sub- 
graphs H. Obviously then, Pr(X>0) — Pr(S(« t ,m (: ) ^Q(n,p)). By Cauchy-Schwarz's 

inequality E(XY) 2 <E(X 2 )E(Y 2 ) and Markovs inequality Pr(X>a) < we 

a 

obtain the following; bounds 



E(Xf 

if*) 



< Pr(X>0)<E(X). 



(12) 



This way of estimating probabilities is known as the second moment method 45 . 
It is convenient to define the probability that a graph of n c nodes contains m c edges 

r = Pr(S(n c ,m c ) £ Q(n t ,p)) = (W ) V< (1 -p) (?)-"' . 

Theorem 4. The expected number of occurrences of an induced subgraph with n c 
nodes and m c edges in a random graph with n nodes and density p, is given by 



E(Z) = 



(14) 



Proof. By linearity of expectation, we have E(X) — E(Xh), and because X H is 
an indicator variable E(X H ) — Pr(X H — 1). Notice that H has n c nodes, so that 
He£?(» c ,_p),aadPr(XH — 1) — r. There are J subgraphs of n c nodes in a graph with 
n nodes, which concludes the proof. 

For E(X 2 ) the idea is to calculate the expected value of the number of pairs of 
subgraphs that have m c edges. We do this by separating in three parts: the parts of the 
two subgraphs without overlap, and the part that overlaps. 

Theorem 5. The expected squared number of occurrences of an induced subgraph 
can be written as 

k-o w v ' m (A) \m c -m(A)J (15) 



(1-/0 



M(u)-(m e -m(A)) 



with M(u) 



n c (n c — l) — u(u— 1) 



Proof. The variable X 2 can be decomposed into parts X H X X H ', such that we need to 
investigate the probability that both H and H' have m c edges. So, we can separate this 
expectancy in parts of partially overlapping subgraphs, like 



E(* 2 ) = 



E E Pr (ii«n= 



(16) 



where u represents the overlap between the different subgraphs. If H and H' are 
(edge) independent, so when u < 1, the answer is simply given by Pr(X H = l) 2 . For u 
> 1 the answer is more involved. 

So let us consider two subgraphs H and H' such that \H D H'\ = u > 1. Let us 
separate this in three independent parts, the overlap A = H D H', and the remainders 
A — H — A and B = H' — A. Clearly then, |A| - u, and \A\ - \B\ =n c ~U. The 
probability that ||H|| = ||H'|| = m c can then be decomposed in the probability that 
the sum of these independent parts sum to m c . The probability that | \H\ \ = m c can be 
decomposed as 

Pr(||H||=w c )=EPr(||A||=m(A)) 

m(A) 

Pr(||H|| = m c | ||A|| = m(A)). 

where m(A) signifies the number of edges within A. Similarly, we arrive at the 
conditional probability for both subgraphs H and H' . However, since we have con- 
ditioned exactly on the overlapping part, the two remaining parts are independent, 
and we can write 

Pr(||H|| = ||H'||=m c | ||A|| =m(A)) = Pr(||ff|| = m c \ ||A|| = m(A)) 2 . 



This probability can be calculated and yields 
Pr(||H|| = m c | ||A||=m(A)) = 



m c — m(A) 



/ , n c in c — l) — uiu — l) 
where M(u) = — -. We then obtain 



M (") \j«,-|«(4)/ 1 _ p j«(«)-t«:-»(/>)) j 



Pr(||H|| = ||H'||=m r )=^Pr(||A|| = m(A)) 

,»(A) 



M(u) 
m c — m(A) 



n 2K-m(A)) 



(1-P) 



2M( u )-2(m,- m (A)) 



which leads to 

'(r) 



M(u) 



m c — m(A) 



M(u)-{m c -m{Aj) 



where m(A) ranges from 0 to the minimum of m c and the number of possible edges 

GO- 

Now counting the number of subgraphs that overlap in u nodes, for each choice of 
subgraph H, we choose u nodes in H, and n c — u nodes in the remaining n — n c nodes. 
In total, there are then 



n \ i n c \ l n — n c 
n c j \u ) \ n c — u 



overlapping subgraphs with u nodes in common. Concluding, we arrive at 
E(X 2 ) = ^C H Pr(||H|| = pf'||=m c | \HC\H'\ = u). 



Writing this out, we arrive at equation (15). 

We consider subgraphs of size sn, with 0 < 5 < 1 with fixed density q. For the 
asymptotic analysis, we can afford to be a bit sloppy with this density, and consider 
(sn) 2 possible edges in the subgraph of sn nodes, so that m c = qisn) 1 , and we now 
denote by S(sn, q) the subgraphs with density q instead of the actual number of edges. 

Theorem 6. The probability for a dense subgraph can be bounded below and above 
asymptotically as 



?r(S(sn,q)^G(n,p))=e & (- i - srt)2D ^) 
where D{q \ \ p) is the Kullback-Leibler divergence 



D(9llj>)=«logf + (l-«)log^— I- 

p i-p 



(17) 



(18) 



Proof. We prove the asymptotic result by showing that both an upper and a lower 
bound have a similar asymptotic behaviour. The upper and lower bounds are pro- 
vided by Markov's and Cauchy-Schwarz's inequality as stated in equation (12). We 
will first prove the upper bound. Taking logarithms on Stirling's approximation, we 
obtain that 

where H(p) is the binary entropy 

H(p) = -plogp-(l-p) log(l-p). (19) 
We apply this to E(X) = jr with r as in equation (13) and we obtain 

log E(X) ~«H(s) + (sn) 2 H(q) + log(p^ 2 (l -pf-^\ 

which can be simplified to log E(X) ~ nH{s) — (sn) 2 D(q\\p), utilising the binary 
Kullback-Leibler divergence 28 

D(g|]p)=9log| + (l- 9 )logL=|, (20) 

which yields the upper bound by Markov's inequality. 

We need the second moment for the lower bound. This can be rewritten as 

E(X) 2 = E(X) Eu E„(A)/(^«(A)), with 



\ u j \ n c -uj \ m c -m(A)J 
By Cauchy-Schwarz inequality, we want that 



(1-P) 



M(u)-{m c -m{A)) 



(21) 



log i§) = lo g E w- lo sE E/("- m ( A )) 

increases as — (sn) 2 D(q \ \ p). We know that by Jensen's inequality we have 



logg^ >logE(X) - Y, log E/("' m < A » 

1 ' u m(A) 



(22) 
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/(a,m(A)) = 



Using the notation u — otsn we can write 

m\ ( (l-s)« \ f(l-J)(snf 
y asn J \s(l — a)n J \ m c — m(A) 

pm,-m(K)^ l _p-j(l-«?(snf-(m,-m(K]) 

We can bound 

(sn) 2 mm a 2 ,<; 



/ (l-* 2 )(s«) j m< .<av, pjO =. v 



by 



1M 2 

E 



(i-^lW ]p fc (i-p)< 1 - a2 )< s "' 2 - fc 



with k — m c — m(A), in which we recognize the binomial cumulative probability Pr( Y 
< m c ) where Y are the number of edges in the overlapping part. By Hoeffdings 
inequality this can be bounded by 



expl —2(sn) 



1-a 2 

Combining with our earlier result on E(X), we then have 

log ip^y - -( sn ) 2D (t\\p)+ nH ( s )- ^5«h(«)+(i-5)»h((i-«)^ 

Al-« 2 -qf 



(23) 



2(snY 



1-a 2 



For large enough n the quadratic term dominates, and we obtain 

[F_(X) 2 2 

log > — (sn) D(qllp), giving the lower bound. By combining the lower and 

upper bound we obtain the asymptotic result stated in the theorem. 

Optimizing significance. As is common in the Louvain method 29 , we look at the 
difference of moving some node. However, we also need to aggregate the graph, and 
still correctly move communities. For that we need the node size n h similar as for 
CPM 14 , which initially is n, — 1 . Upon aggregating the graph the node size is set to the 
sum of the node sizes within a community. Moving node i from community r to 5 with 
size e ir edges to community r and e is edges to community s gives a difference in 
significance of 



where q' r — 



AS(<r) = \D(q r \\p)- 



D{q s \\p)- 



m r — e„ ffi s + £i 

and q s — 



n r — rij 
2 

n s + «i 



D{q' s \\p), 



n r — vi i 
2 



n s + Hi 
2 
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